{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tree Shap\n",
    "\n",
    "To get a sense of the Shap library that we'll be using, let's implement the simple version of the Tree Shap algorithm.  This is based on Scott Lundberg's paper [Consistent Individualized Feature Attribution for Tree\n",
    "Ensembles](https://arxiv.org/pdf/1802.03888.pdf)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Training and re-using a single tree model\n",
    "\n",
    "You may notice that calculating Shap values for every feature, and for every individual data point, is very computationally expensive.  For example, we would be training multiple models just to calculate the importance of one feature.\n",
    "\n",
    "With decision trees, we can actually train a decision tree on all the features, and then re-use that single tree to calculate shapley values using subsets of that single tree.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![alt text](./tree_shap_images/tree_shap_img_01.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For some intuition, let’s look at a model that has just two features.  The tree splits on feature one first, and then on feature 2. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![tree_shap_img_02.png](./tree_shap_images/tree_shap_img_02.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we wanted to see how a model would perform if it only used the first feature, we could look at the subtree that consists of the top three nodes of this tree."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![tree_shap_img_03.png](./tree_shap_images/tree_shap_img_03.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similarly, if we wanted to see how a model would make predictions if it used only feature 2, we could look at the subtree containing the bottom three nodes, starting at the node that splits on feature 2.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![tree_shap_img_04.png](./tree_shap_images/tree_shap_img_04.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Another example with 3 features\n",
    "Now let’s look at a tree that is trained on three features.  Let’s say it splits on feature 1, then on feature 2, then on feature 3.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![tree_shap_img_04.png](./tree_shap_images/tree_shap_img_05.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How do we simulate the prediction of a tree that was only trained on features 1 and 3, but not on feature 2?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![tree_shap_img_04.png](./tree_shap_images/tree_shap_img_06.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Well, if we didn’t split on feature 2, that means that we would include the training samples in both the left and right sub-tree of that node when making a prediction.  This is how we can simulate that the tree never split on feature 2."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![tree_shap_img_04.png](./tree_shap_images/tree_shap_img_07.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let’s also think about how we handle the predictions when we do split on the feature.  If we split on feature 3, and the particular data point we’re making a prediction for ends up in the left child node, then we can use the prediction based on training samples in the left sub-tree, and ignore the training samples in the right sub-tree.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![tree_shap_img_04.png](./tree_shap_images/tree_shap_img_08.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We’ll walk through the algorithm to do this, and then you’ll get to practice this yourself."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Algorithm\n",
    "\n",
    "Here’s the algorithm used to calculate the prediction of a tree, given a subset of features.  You can check out the paper [Consistent Individualized Feature Attribution for Tree Ensembles](https://arxiv.org/pdf/1802.03888.pdf), page 4 algorithm 1."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![tree_shap_img_04.png](./tree_shap_images/tree_shap_img_09.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, $G$ is a function that gets called recursively to walk down the tree starting at the root node.  $w$ is the weight given to the predictions of each node.  $v$ is the prediction of a leaf node.  $r_{a_j}$ and $r_{b_j}$ are the number of data points in the left and right child nodes of node $j$.  $r_j$ is the number of data points in node $j$.\n",
    "\n",
    "We can use this to walk through a decision tree that is trained on all features, and calculate the prediction of a tree that would have been created from a subset of the features.\n",
    "\n",
    "Let's look at specific parts of this algorithm in more detail."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### leaf nodes\n",
    "Let's look at the the line that handles leaf nodes.  It takes the prediction of that leaf node and multiplies it by some weight.  The weight is determined by the proportion of training data points that end up reaching that leaf node."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![tree_shap_img_04.png](./tree_shap_images/tree_shap_img_10.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ignoring a feature\n",
    "Next, let’s look at the case when the feature that’s used at a node is not within the subset of features that we want to split on.  In other words, we want to pretend that we didn’t train the model on this feature.  In that case, in order to pretend that we’re not splitting on that feature, we take the sum of the weighted predictions from both its left and right subtree."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![tree_shap_img_04.png](./tree_shap_images/tree_shap_img_11.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### including a feature\n",
    "\n",
    "Finally, for cases when the feature at that node is within the subset of features that we want to use, then we can follow just the left subtree or just the right subtree, whichever path that the input data gets assigned to by the split."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![tree_shap_img_04.png](./tree_shap_images/tree_shap_img_12.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Implement it in code!\n",
    "You’ll get to practice this algorithm!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "!{sys.executable} -m pip install numpy==1.14.5\n",
    "!{sys.executable} -m pip install scikit-learn==0.19.1\n",
    "!{sys.executable} -m pip install graphviz==0.9\n",
    "!{sys.executable} -m pip install shap==0.25.2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sklearn.ensemble\n",
    "import shap\n",
    "import numpy as np\n",
    "import graphviz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## generate sample data\n",
    "\n",
    "Feature 0 and feature 1 form the AND operator, and feature 2 does not contribute to the prediction of the label, because it's always zero."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# AND case (features 0 and 1)\n",
    "N = 100\n",
    "M = 3\n",
    "X = np.zeros((N,M))\n",
    "X.shape\n",
    "y = np.zeros(N)\n",
    "X[:1 * N//4, 1] = 1\n",
    "X[:N//2, 0] = 1\n",
    "X[N//2:3 * N//4, 1] = 1\n",
    "y[:1 * N//4] = 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train a decision tree\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# fit model\n",
    "model = sklearn.tree.DecisionTreeRegressor(random_state=0)\n",
    "model.fit(X, y)\n",
    "\n",
    "# draw model\n",
    "dot_data = sklearn.tree.export_graphviz(model, out_file=None, filled=True, rounded=True, special_characters=True)  \n",
    "graph = graphviz.Source(dot_data)  \n",
    "graph "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tree attributes\n",
    "\n",
    "[sklearn.tree.tree._tree](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx)\n",
    "\n",
    "```\n",
    "The binary tree is represented as a number of parallel arrays. The i-th\n",
    "    element of each array holds information about the node `i`. Node 0 is the\n",
    "    tree's root.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tree0 = model.tree_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  left and right child nodes\n",
    "```\n",
    "children_left : array of int, shape [node_count]\n",
    "        children_left[i] holds the node id of the left child of node i.\n",
    "        For leaves, children_left[i] == TREE_LEAF. Otherwise,\n",
    "        children_left[i] > i. This child handles the case where\n",
    "        X[:, feature[i]] <= threshold[i].\n",
    "    children_right : array of int, shape [node_count]\n",
    "        children_right[i] holds the node id of the right child of node i.\n",
    "        For leaves, children_right[i] == TREE_LEAF. Otherwise,\n",
    "        children_right[i] > i. This child handles the case where\n",
    "        X[:, feature[i]] > threshold[i].\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"tree0.children_left: {tree0.children_left}\")\n",
    "print(f\"tree0.children_right: {tree0.children_right}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### features\n",
    "```\n",
    "    feature : array of int, shape [node_count]\n",
    "        feature[i] holds the feature to split on, for the internal node i.\n",
    "    threshold : array of double, shape [node_count]\n",
    "        threshold[i] holds the threshold for the internal node i.\n",
    "    value : array of double, shape [node_count, n_outputs, max_n_classes]\n",
    "        Contains the constant prediction value of each node.\n",
    "    impurity : array of double, shape [node_count]\n",
    "        impurity[i] holds the impurity (i.e., the value of the splitting\n",
    "        criterion) at node i.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"tree0.feature: {tree0.feature}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For Node 0, feature 1 is used to split the data . For Node 2, feature 0 is used for splitting.  For the other nodes (1, 3, 4), there are no features used for splitting."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Thresholds"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"tree0.threshold: {tree0.threshold}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The threshold divides the data points using the chosen feature.  <= 0.5 and the data go to the left child; > .5 and the data go in the right child.  The -2 is for nodes that don't split on any feature."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Value"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"tree0.value : \\n{tree0.value}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`value` is the average prediction for each node.  Node 0 predicts 0.25 on average.  Node 2 predicts 0.5 on average."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### node samples\n",
    "```\n",
    "n_node_samples : array of int, shape [node_count]\n",
    "        n_node_samples[i] holds the number of training samples reaching node i.\n",
    "\n",
    "weighted_n_node_samples : array of int, shape [node_count]\n",
    "        weighted_n_node_samples[i] holds the weighted number of training samples\n",
    "        reaching node i.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"tree0.n_node_samples : {tree0.n_node_samples}\")\n",
    "print(f\"tree0.weighted_n_node_samples : {tree0.weighted_n_node_samples}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The n_node_samples counts how many data points from the parent gets passed to each child node.  `weighted_n_node_samples` is the same in this case, because it's a single decision tree.  If this were a tree within a random forest, 2/3 of the training data may be sampled and used to train a tree.  The `weighted_n_node_samples` would then be re-scaled to equal the total sample size.  We can use either in the calculations we'll do below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Quiz\n",
    "What proportion of samples went to the left child and right child of the root node?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO\n",
    "proportion_in_left_child = # ...\n",
    "proportion_in_right_child = # ...\n",
    "print(f\"proportion of samples in left child of root node {proportion_in_left_child}\")\n",
    "print(f\"proportion of samples in left child of root node {proportion_in_right_child}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Wrap with Tree class\n",
    "\n",
    "To make the tree object easier to work with, we'll wrap it inside our custom Tree class.  Please complete the functions within the Tree class below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Bonus Challenge:** Try implementing your own wrapper class for the tree object.\n",
    "\n",
    "Think about attributes that you may need in order to implement algorithm 1.  For example, how do we know when a node is an internal or leaf node?  What fraction of samples are in the left child relative to its parent node?  On which node is each feature split on?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "Challenge: try implementing your own wrapper class\n",
    "\"\"\"\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**If you prefer some starter code:** You can also use the starter code below if you prefer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "NO_NODE = -1\n",
    "NO_FEATURE = -2\n",
    "class Tree:\n",
    "    \n",
    "    def __init__(self, tree):\n",
    "        if str(type(tree)).endswith(\"'sklearn.tree._tree.Tree'>\"):\n",
    "            self.weight = 1 \n",
    "            self.children_left = tree.children_left\n",
    "            self.children_right = tree.children_right\n",
    "            self.features = tree.feature\n",
    "            self.thresholds = tree.threshold\n",
    "            self.values = tree.value[:,0,0] # tree.value is n by 1 by 1; get the n prediction values (there are n nodes in tree)\n",
    "            self.n_node_samples = tree.n_node_samples # actual number of nodes\n",
    "            self.node_sample_weight = tree.weighted_n_node_samples #rescaled number of nodes\n",
    "\n",
    "    def is_internal(self,i):\n",
    "        return (self.children_left[i] != NO_NODE or self.children_right[i] != NO_NODE)\n",
    "\n",
    "    def is_leaf(self,i):\n",
    "        # TODO\n",
    "        \n",
    "            \n",
    "    def left_child(self,i):\n",
    "        return self.children_left[i]\n",
    "\n",
    "    def right_child(self,i):\n",
    "        # TODO\n",
    "        \n",
    "\n",
    "    def proportion_of_samples_in_left_child(self,i):\n",
    "        # TODO\n",
    "        \n",
    "    \n",
    "    def proportion_of_samples_in_right_child(self,i):\n",
    "        # TODO\n",
    "        \n",
    "\n",
    "    def node_prediction(self,i):\n",
    "        return self.values[i]\n",
    "\n",
    "    def feature_that_split_node_i(self,i):\n",
    "        return self.features[i]\n",
    "\n",
    "    def threshold_at_node_i(self,i):\n",
    "        return self.thresholds[i]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tree_wrap = Tree(tree0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Calculate the prediction of a tree model, given a subset of features\n",
    "\n",
    "We'll implement algorithm 1 of Scott Lundberg's paper.  This is a way to use a single trained tree to estimate predictions of other trees that would be trained on a subset of the features."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Bonus challenge:** Try implementing this function completely by yourself!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"Bonus challenge: implement f_given_S on your own!\"\"\"\n",
    "def f_given_S(tree, S, x):\n",
    "    \"\"\"\n",
    "    tree: the custom Tree class\n",
    "    S: set of integers reprenting features that are used to train the model.\n",
    "    x: sample observation on which to calculate the prediction of the model.\n",
    "    \"\"\"\n",
    "    pass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**If you prefer some starter code:** You can also use the starter code below to implement the algorithm."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "You can use this starter code if you get stuck while implementing the algorithm on your own\n",
    "\"\"\"\n",
    "def f_given_S(tree, S, x):\n",
    "    \"\"\"\n",
    "    tree: the custom Tree class\n",
    "    S: set of integers reprenting features that are used to train the model.\n",
    "    x: sample observation on which to calculate the prediction of the model.\n",
    "    \"\"\"\n",
    "    # the root node is at index 0 in the list\n",
    "    starting_node = 0\n",
    "    # When starting at the root node, the weight assigned is 1 (100%).\n",
    "    starting_weight = 1\n",
    "\n",
    "    def traverse_tree(node_i, weight):\n",
    "        \"\"\"\n",
    "        nested function that will be called recursively\n",
    "        \"\"\"\n",
    "        if tree.is_leaf(node_i):\n",
    "            # TODO: multiply the weight times the node prediction\n",
    "            \n",
    "        else: # is internal node\n",
    "            feature_index = tree.feature_that_split_node_i(node_i)\n",
    "            feature_value = x[feature_index]\n",
    "            threshold = tree.threshold_at_node_i(node_i)\n",
    "            left_child = tree.left_child(node_i)\n",
    "            right_child = tree.right_child(node_i)\n",
    "            if feature_index in S:\n",
    "                if feature_value <= threshold:\n",
    "                    # TODO: recursively traverse the left subtree\n",
    "                    \n",
    "                else:\n",
    "                    # TODO: recursively traverse the right subtree\n",
    "                    \n",
    "            else: #feature is not in subset S\n",
    "                # TODO: traverse the left sub-tree,\n",
    "                # and update the weight to be the current weight times the proportion of samples in the left child node\n",
    "                \n",
    "                \n",
    "                # TODO: traverse the right sub-tree,\n",
    "                # and update the weight to be the current weight times the proportion of samples in the left child node\n",
    "                \n",
    "                \n",
    "                # TODO: return the sum of both sub-trees\n",
    "                \n",
    "    \n",
    "    # start traversing the tree\n",
    "    return traverse_tree(starting_node,starting_weight)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Try out the function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sample_values = np.array([1,1,1])\n",
    "S = set([2]) # if you input only feature 2, expect 0.25\n",
    "f_given_S(tree_wrap, S, sample_values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### try empty feature set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "S = set([]) #for empty set, expect output to be 0.25\n",
    "\n",
    "f_given_S(tree_wrap, S, sample_values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## calculate weight on marginal contribution\n",
    "\n",
    "We'll calculate the weight placed on the marginal contribution of the feature:\n",
    "$ \\frac{|S|! (M - |S| -1 )!}{M!}$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from math import factorial\n",
    "def weight_on_marginal_contribution(size_S, M):\n",
    "    \"\"\"\n",
    "    size_S: number of features in set S\n",
    "    M: numer of total features\n",
    "    \"\"\"\n",
    "    # TODO\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## calculate marginal contribution for a single feature \n",
    "\n",
    "$ f(S \\cup i) - f(S)$  \n",
    "\n",
    "Fill in the function that takes in the custom Tree object, a sample data point, a list containing the set of features in set S (excluding feature \"i\"), and also the feature for which we want to calculate the marginal contribution.  Keep in mind that set S excludes feature \"i\".\n",
    "\n",
    "**Hint:** The python `set` class has the member function `.add`.  \n",
    "Note that you may need to use the `.copy` function as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def marginal_contribution_of_feature(tree,x,S,feature_i):\n",
    "    \"\"\"\n",
    "    tree: the custom Tree object that wraps the sklearn tree_ object\n",
    "    x: a sample observation that contains all features\n",
    "    S: a list of integers, specifying the features in subset S, excluding feature i.\n",
    "    feature_i: an integer specifying the feature for which we're calculating the marginal contribution.\n",
    "    \"\"\"\n",
    "    # TODO: create the union of S and i\n",
    "    \n",
    "    # TODO: return the difference in prediction with feature \"i\" and without feature \"i\"\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Try it out\n",
    "\n",
    "We'll try out the `marginal_contribution_of_feature` function.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "feature_i = 0 #index of feature for which we want to calculate its marginal contribution\n",
    "S = set([1]) # Set that excludes feature i\n",
    "x = X[0] #grab one data point to calculate marginal contribution on\n",
    "marginal_contribution_of_feature(tree_wrap,x,S,feature_i) # we expect 0.5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The marginal contribution of feature 0 is 0.5.  This means that the prediction of the model when feature 0 is present is 0.5 greater than the model's prediction when it only has feature 1."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate all subsets\n",
    "\n",
    "Fill in a function that generates all possible subsets S.  \n",
    "We'll use `itertools.combinations`, which takes in a list, and also the size of each subset.  It returns an iterable object that contains tuples of all the combinations.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from itertools import combinations\n",
    "\n",
    "# try out the combinations function\n",
    "tmp_combo = combinations([1,2,3,4],2)\n",
    "for subset in tmp_combo:\n",
    "    print(subset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### fill in the function generate_all_subsets\n",
    "\n",
    "Keep in mind that since the iterable returned by `combinations` holds tuples, we can create sets out of the tuples by using `set(the_tuple_object)`.  We'll store the S sets as `set` types, since we defined the `f_given_S` function to take S as a type `set`.\n",
    "\n",
    "Remember to also include the empty set.  We can do this with `set([None])`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "** Bonus challenge:** Try implementing this function on your own!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "Try implementing on your own!\n",
    "\"\"\"\n",
    "def generate_all_subsets(S):\n",
    "    \"\"\"\n",
    "    S: set of integers representing the features in set S\n",
    "    \"\"\"\n",
    "    pass\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**If you prefer some starter code: ** You can also use the starter code if you get stuck"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "Starter code version\n",
    "\"\"\"\n",
    "\n",
    "def generate_all_subsets(S):\n",
    "    \"\"\"\n",
    "    S: set of integers representing the features in set S\n",
    "    \"\"\"\n",
    "    sets_l = []\n",
    "    for size in range(1,len(S)+1):\n",
    "        # TODO: create a combinations iterable\n",
    "\n",
    "        # TODO: loop thru the combo iterable and append sets to the sets_l list\n",
    "            \n",
    "    # TODO: also include the empty set\n",
    "\n",
    "    return sets_l"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Try out the function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "S = set([0,1,2])\n",
    "generate_all_subsets(S)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Calculate shapley value for one feature\n",
    "\n",
    "Implement a function that calculates the shapley value for a single feature, by iterating across all subsets S."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Bonus challenge:** Try implementing the function yourself!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def shap_feature_i(tree,x,feature_i):\n",
    "    \"\"\"\n",
    "    tree: the custom Tree object that wraps the tree_ from sklearn.\n",
    "    x: a sample data point\n",
    "    i: the feature for which we want to calculate its shapley value.\n",
    "    \"\"\"\n",
    "    pass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**If you prefer starter code:** You can also fill in the starter code below if you prefer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def shap_feature_i(tree,x,feature_i):\n",
    "    \"\"\"\n",
    "    tree: the custom Tree object that wraps the tree_ from sklearn.\n",
    "    x: a sample data point\n",
    "    i: the feature for which we want to calculate its shapley value.\n",
    "    \"\"\"\n",
    "    all_features = set(np.arange(0,x.shape[0]))\n",
    "    all_features_minus_i = all_features.copy()\n",
    "    all_features_minus_i.remove(feature_i) #remove feature \"i\"\n",
    "\n",
    "    # TODO: generate all subsets S\n",
    "    S_list = # ...\n",
    "    phi = # ...\n",
    "    num_features_total = len(all_features)\n",
    "    \n",
    "    # iterate thru S_list\n",
    "    for S in S_list:\n",
    "        # TODO: calculate the number of features stored in S\n",
    "        # Handle the special case where S contains None, because the number of features should be 0 in that case\n",
    "        if None in S and len(S) == 1:\n",
    "            # ...\n",
    "        else:\n",
    "            # ...\n",
    "            \n",
    "        # TODO: increment phi by the weigth on the marginal contribution * marginal contribution of \"i\"\n",
    "        phi += # ...\n",
    "    \n",
    "    return np.round(phi,decimals=3)\n",
    "    return phi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Try out the function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = X[0]\n",
    "shap_feature_i(tree_wrap,x,0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Calculate feature importance of all features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def shap_tree_explainer(tree_wrap, x):\n",
    "    shap_l = []\n",
    "    for i,v in enumerate(x):\n",
    "        shap_l.append(shap_feature_i(tree_wrap, x,i))\n",
    "    \n",
    "    return np.array(shap_l)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Take an sklearn tree model and calculate feature importance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def shap_tree_model_explainer(tree_model, x):\n",
    "    sklearn_tree = tree_model.tree_\n",
    "    tree_wrap = Tree(tree_model.tree_)\n",
    "    \n",
    "    return shap_tree_explainer(tree_wrap,x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Additive Feature Attribution\n",
    "\n",
    "Additive feature attribution methods are simple models that are used to explain complex models.  You can see the formula in the same paper on page 3."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![alt text](./tree_shap_images/tree_shap_img_13.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Explanation\n",
    "\n",
    "Think of our tree model as the complex model that we wish to explain with a simple, linear model.  The above formula is saying that we can take a single data point with the three features, and the complex model makes a prediction.  We can divide up that prediction among the three features, based on how important those features are to the complex model's prediction, and also based on whether the feature values push the prediction in the positive or negative direction.\n",
    "\n",
    "This is related to the ideas of coalition game theory.  Imagine a team of basketball players scores 100 points in a game. We are trying to attribute part of the final score to each member of the team, based on their contributions, or \"importance.\"\n",
    "\n",
    "When the contributions of each feature are added up to equal the complex model's prediction, this linear combination of contributions is the simple linear model that is being used to explain the complex model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Example\n",
    "\n",
    "Let's say that we've trained a complex model on 3 features.  If it's given no inputs to make a prediction, then its prediction would be the equal weighted average of all its training samples.  \n",
    "Let's say that equal weighted average of the training labels is **100**.  In other words, if a model is given no features and asked to make a prediction, it would predict 100, which is the expected value based on the training labels.\n",
    "\n",
    "Now, let's say we give the complex model a single sample observation, with all three features, and the complex model gives a prediction of **200**.  \n",
    "\n",
    "The additive feature attribution model may assign feature importances to the three features like this:  \n",
    "feature 0: +50  \n",
    "feature 1: +90  \n",
    "feature 2: -40  \n",
    "\n",
    "So this is saying that feature 0 pushed the complex model's prediction up by 50, feature 1 pushed the complex model's prediction up by 90, and feature 2 pushed the model's prediction down by 40.  The end result was to go from the expected value of 100 to the prediction of 200.\n",
    "\n",
    "The shapley values that we just calculated are these values that push the model's prediction from the average of the training labels to the model's final prediction.  When we add up the shapley values for all the features, they add up to the model's prediction."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Compare this implementation with shap library"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### test 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = np.array([0,0,0])\n",
    "shap_values = shap_tree_model_explainer(model,x)\n",
    "expected_value = np.mean(y)\n",
    "print(f\"my shap function: {shap_values}\")\n",
    "print(f\"shap library:     {shap.TreeExplainer(model).shap_values(x)}\")\n",
    "print(f\"expected value (average of labels in y) {expected_value}\")\n",
    "print(f\"sum of shapley values for all features: {np.sum(shap_values)}\")\n",
    "print(f\"model prediction {model.predict(x.reshape(1,-1))}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Quiz\n",
    "\n",
    "How do you interpret the shapley values of each feature when features 0,1 and 2 are all 0?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Answer\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### test 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = np.array([1,0,0])\n",
    "shap_values = shap_tree_model_explainer(model,x)\n",
    "print(f\"my shap function: {shap_values}\")\n",
    "print(f\"shap library:     {shap.TreeExplainer(model).shap_values(x)}\")\n",
    "print(f\"expected value (average of labels in y) {expected_value}\")\n",
    "print(f\"sum of shapley values for all features: {np.sum(shap_values)}\")\n",
    "print(f\"model prediction {model.predict(x.reshape(1,-1))}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Quiz\n",
    "\n",
    "How do you interpret the shapley values of each feature when feature 0 is 1 and the other features are 0?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Answer\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### test 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = np.array([1,1,0])\n",
    "shap_values = shap_tree_model_explainer(model,x)\n",
    "print(f\"my shap function: {shap_values}\")\n",
    "print(f\"shap library:     {shap.TreeExplainer(model).shap_values(x)}\")\n",
    "print(f\"expected value (average of labels in y) {expected_value}\")\n",
    "print(f\"sum of shapley values for all features: {np.sum(shap_values)}\")\n",
    "print(f\"model prediction {model.predict(x.reshape(1,-1))}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Quiz\n",
    "\n",
    "How do we interpret the shapley values when feature 0 and 1 are both 1?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Answer\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Solution\n",
    "\n",
    "[solution notebook](tree_shap_solution.ipynb)"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "env_p7a",
   "language": "python",
   "name": "env_p7a"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
